Data-moderate assessment approaches for Southwest Pacific Ocean (SWPO) striped marlin
1 Executive summary
2 Introduction
Assessment of Southwest Pacific Ocean (SWPO) striped marlin (Kajikia audax) has been challenging. As documented in the joint NOAA-SPC modeling meeting report (Ducharme-Barth et al., 2025), key issues identified by WCPFC SC20 included poor fits to size composition data and relative abundance indices, conflicts between different data sources, and difficulties in estimating model initial conditions.
Discussions at the 2025 SPC Pre-Assessment Workshop also highlighted challenges related to the estimation of total population scale, where the low biomass scale indicated by previous versions of the assessment appears largely driven by the size composition from multiple fisheries. While estimation of low biomass scale is not inherently problematic, the resulting high ratio of fishing mortality (F) to natural mortality (M) that is maintained over multiple decades in the face of relative stability in catches and catch-per-unit-effort (CPUE) is suspicious and potentially indicative of additional model mis-specification.
The complexity of integrated age-structured models, while providing detailed population dynamics representation, can become problematic when fundamental data conflicts exist or when key biological parameters are uncertain. For SWPO striped marlin, these challenges are compounded by spatiotemporal heterogeneity in fleet coverage, potential non-representativeness in mixed-fleet composition data, and limitations in age data from opportunistic sampling. In the current case, changes to productivity (growth, natural mortality, maturity, and/or steepness) or selectivity assumptions will impact biomass scaling through predictions of expected size composition data.
Scaling back model complexity to a data-moderate approach, like Bayesian surplus production models (BSPMs), offers analysts a simplified yet robust alternative for stock assessment when data limitations or conflicts present challenges for more complex models. These models have proven effective in recent pelagic fish stock assessments (Winker et al., 2018), and are routinely applied for WCPFC shark assessments (ISC, 2024; Neubauer et al., 2019, 2024). They also facilitate a more tractable exploration of model assumptions by distilling productivity and fishing assumptions into a restricted parameter subset. The BSPM framework allows for explicit incorporation of parameter uncertainty through informative priors while maintaining computational efficiency and interpretability.
One of the advantages of the Bayesian approach is through the use of prior information to propagate uncertainty in model assumptions, and biological simulation approaches can provide a robust methods for parameterizing key population dynamics parameters (ISC, 2024; Pardo et al., 2016). For species with complex life histories like striped marlin, these simulation-based priors can incorporate uncertainty in growth, maturity, natural mortality, and reproductive parameters to derive realistic distributions for maximum intrinsic rate of increase and carrying capacity. Additionally, prior pushforward checks can be used to further refine parameter distributions by identifying parameter combinations that produce clearly improbable outcomes.
This analysis presents a data-moderate approach for SWPO striped marlin that addresses some of the key technical challenges identified in previous assessments while providing a robust framework for stock status evaluation. The model incorporates biological uncertainty through simulation-based priors and includes approaches to address catch uncertainty and population depletion. It is important to note that this analysis complements the 2025 revised stock assessment report (Castillo-Jordan et al., 2025). While the relative simplicity of data-moderate approaches offer some advantages relative to more data-intensive approaches like fully-integrated age-structured stock assessment models, particularly when uncertainty in data or key assumptions are high, this comes at the cost of simplifying the model fishery and population dynamics. Over-simplification of key age-structured processes, or lack of spatiotemporal specificity could result in biases in data-moderate approaches. Taking multiple modeling approaches into account can give managers a holistic view on the likely status and trajectory of the stock.
3 Methods
3.1 Model Framework
In order to address some of the issues raised in Section 2, a series of Bayesian state-space surplus production models (BSPMs) spanning the period 1952-2022 were developed following the Fletcher-Schaefer production model framework (Edwards, 2024; Winker et al., 2020). Development of the BSPM followed the approach of (ISC, 2024; Neubauer et al., 2019) and incorporated recent best practices for surplus production models (Kokkalis et al., 2024) and Bayesian workflows for stock assessment (Monnahan, 2024) as guides for model development, analysis, and presentation.
The models were implemented in the Stan probabilistic programming language (Team, 2024b) to take advantage of enhanced convergence diagnostics, greater efficiency in posterior sampling through the no-U-turn (NUTS) Hamiltonian Monte Carlo algorithm (Betancourt & Girolami, 2013), and greater flexibility with model configuration and prior specification. Implementation in R using the rstan package (Team, 2024a) provides connection to an ecosystem of additional R packages for model diagnostics and validation: bayesplot (Gabry & Mahr, 2024) for posterior visualization and loo (Vehtari et al., 2024).
The model begins from an assumed unfished state in 1952 and incorporates process error in population dynamics, observation error in abundance indices, and uncertainty in biological parameters. Several model variants were developed: a baseline model (0001-2024cpueExPrior), a depletion-constrained model (0002-2024cpueDepPrior) that incorporates additional information about historical stock status from mean size data, and an extension of the depletion-constrained model that assumes a different prior on the variability in fishing mortality (0003-2024cpueFPrior). The 0002-2024cpueDepPrior and 0003-2024cpueFPrior models include a mean weight based prior on 1988 depletion, and more detail on the development of this prior can be found in Section 3.2.3. Unless otherwise mentioned, all models used the same prior parameterizations (Table 2) based on the biological simulation filtering described in Section 3.2.
3.1.1 Input data
The BSPM was fit to catch and standardized CPUE data for SWPO striped marlin. Annual catch data (in numbers) spanning 1952-2022 were compiled from the 2024 assessment (Castillo-Jordan et al., 2024) sources and aggregated into total removals by year.
The catch series (Figure 1) shows initial low removals in 1952-1953, followed by a high peak in 1954. Overall, however, catches have been relatively stable though catches in recent decades show a slight decline. A single standardized CPUE index (1988-2022) from the diagnostic case of the 2024 assessment (Castillo-Jordan et al., 2024) was used. The index is very noisy (Figure 2). In general, however, there is a slight declining trend over the index period. The decline was most pronounced after 2000, though the trend stabilized somewhat, prior to showing a slight increase in recent years.
3.1.2 Population Dynamics
The population dynamics follow a Fletcher-Schaefer surplus production model with state-space formulation where population depletion \(x_t\) (relative to carrying capacity \(K\)) evolves according to:
\[x_t = \begin{cases} (x_{t-1} + R_{max} x_{t-1} (1 - \frac{x_{t-1}}{h}) - C_{t-1}) \times \epsilon_t, & x_{t-1} \leq D_{MSY} \\ (x_{t-1} + x_{t-1}(\gamma \times m)(1 - x_{t-1}^{n-1}) - C_{t-1}) \times \epsilon_t, & x_{t-1} > D_{MSY} \end{cases}\]
where \(R_{max}\) is the maximum intrinsic rate of increase, \(n\) is the shape parameter, and \(\epsilon_t\) represents multiplicative process error with \(\epsilon_t = \exp(\delta_t - \sigma_P^2/2)\) and \(\delta_t \sim N(0, \sigma_P)\). The intermediate parameters are: \(D_{MSY} = (1/n)^{1/(n-1)}\), \(h = 2D_{MSY}\), \(m = R_{max}h/4\), and \(\gamma = n^{n/(n-1)}/(n-1)\).
3.1.3 Observation Model
The model fits to a standardized CPUE index using a lognormal likelihood:
\[I_t \sim \text{Lognormal}(\log(q \times x_t) - \frac{\sigma_{O,t}^2}{2}, \sigma_{O,t})\]
where catchability \(q\) is analytically derived assuming an uninformative uniform prior, and total observation error \(\sigma_{O,t} = \sigma_{O,input} + \sigma_{O,add}\) combines fixed input uncertainty with an estimated additional error component.
3.1.4 Catch Treatment
Annual fishing mortality is estimated directly \(F_t \sim N^+(0, \sigma_F)\) with catch fitted using lognormal likelihood with uncertainty \(\sigma_C = 0.2\)
Catch is predicted as: \[C_t = (x_t + \text{surplus production}_t) \times (1 - \exp(-F_t)) \times \epsilon_t \times K\]
3.2 Prior Development
3.2.1 Biological Simulation Framework
Informative priors for key population parameters were developed through biological simulation using Monte Carlo sampling from biologically plausible parameter distributions. The simulation framework incorporated uncertainty and correlations in life history parameters:
- Growth parameters: Francis parameterization with independent sampling
- Natural mortality: Reference mortality \(M_{ref}\) negatively correlated with maximum age (r = -0.3)
- Maturity: Length-based logistic function parameters
- Reproduction: Steepness parameter for stock-recruit relationship and sex ratio
- Weight-length relationship: Correlated allometric parameters (r = -0.5) with biological constraints
- Fishery selectivity: Length at first capture for depletion prior
Parameter combinations (Table 1) were filtered to ensure biological realism and included correlations between life history parameters reflecting biological trade-offs (e.g., shorter lived individuals having higher natural mortality rates).
3.2.2 Maximum Intrinsic Rate of Increase (\(R_{max}\)) Prior
\(R_{max}\) was calculated by numerically solving the Euler-Lotka equation using bounded optimization (nlminb function in R) to find the value of \(R_{max}\) that satisfies the equation within convergence criteria. Starting values and bounds were set to ensure biologically realistic solutions. Formulation of the Euler-Lotka equation followed the approach implemented in openMSE (Hordyk et al., 2024; Stanley et al., 2009):
\(\alpha \sum_{a=1}^{A_{max}} l_a f_a \exp(-R_{max} \times a) = 1\)
where \(\alpha\) is the slope at the origin of the stock-recruitment curve, \(l_a\) is the probability of survival to age \(a\), and \(f_a\) is the fecundity (reproductive output) at age \(a\). Additional technical detail can be found in Section 10.1 of the Technical Annex.
Repeating this calculation across each of the 500,000 parameter combinations resulted in a prior distribution of \(R_{max}\) conditioned on plausible biology and life-history characteristics. The resulting distribution was first filtered to values of \(R_{max} < 1.5\) representative of what the literature (Gravel et al., 2024; Hutchings et al., 2012) indicates is most likely for teleosts. A prior pushforward analysis, similar to (ISC, 2024), was used to further refine this distribution and develop a lognormal prior for \(R_{max}\). Briefly, random values of \(R_{max}\) along with random values of the other leading parameters of the BSPM (drawn from the prior distributions described in Table 2) were used to drive simulated populations forward subject to direct removal of the observed catch (Figure 1). Based on the results of the prior pushforward (Figure 3), the \(R_{max}\) distribution was further filtered based on the following criteria:
- Depletion in the terminal year \(>\) 2% and \(R_{max} < 1\)
- Depletion in the terminal year \(>\) 2%, \(R_{max} < 1\), and trend in depletion over the last decade between -5% and 10%
This last filtering step approximates the recent relative trend seen in the index (Figure 2). Fitting a lognormal distribution to the remaining \(R_{max}\) values results in the prior distribution specified in Table 2 and shown in Figure 4. Note that though initial filtering restricted \(R_{max}<1\) the final lognormal prior still assigns some prior weight to \(R_{max}>1\).
3.2.3 Depletion Prior
It is not possible to fit to size composition data in a BSPM framework, however information on fishing mortality and depletion may be contained in the long time series of New Zealand recreational fishing data. This fishery routinely catches the largest individuals and shows a decline in average size since the start of the model period (Figure 5), prior to which limited large-scale commercial fishing occurred. Assuming constant availability and selectivity, declining mean weight can be an indication of increased mortality. Assuming natural mortality M has remained constant, this indicates that the population may be in a more depleted state than in the initial period. The 0002-2024cpueDepPrior variant was developed to capitalize on this potential additional source of information.
A supplementary analysis was used to estimate relative population depletion in 1988 using New Zealand recreational fishing weight data. A modified Gedamke-Hoenig approach (Gedamke & Hoenig, 2006) for using size composition data to estimate mortality in non-equilibrium situations was applied to estimate change in total mortality from \(Z_1\) (initial period) to \(Z_2\) (final period) at a known change point (1952) when industrial fishing commenced at the start of the model period (more detail in Technical Annex Section 10.2). These two total mortality rates were calculated for each of the biological parameter combinations described in Section 3.2.1 and used to calculate relative spawning stock biomass depletion as the ratio of spawning potential under the two mortality regimes:
\[\text{Relative Depletion} = \frac{SPR_{Z_2}}{SPR_{Z_1}} = \frac{\sum_{a=1}^{A_{max}} l_{a,Z_2} f_a}{\sum_{a=1}^{A_{max}} l_{a,Z_1} f_a}\]
where \(l_{a,Z}\) is survival to age \(a\) under total mortality \(Z\), and \(f_a\) is reproductive output at age \(a\) (incorporating maturity, weight, sex ratio, and reproductive cycle). Additional technical detail on the population dynamics components of this equation can be found in Section 10.1 of the Technical Annex.
The resulting depletion estimates across successful parameter combinations were used to construct an informative lognormal prior for stock depletion that could be applied in the BSPM in 1988 (Figure 6).
3.2.4 Fishing mortality prior
An extension of the prior pushforward analysis described in Section 3.2.2 was applied to develop a prior for the variability in fishing mortality \(\sigma_F\). While in Section 3.2.2 the catch was subtracted directly, a random distribution of \(\sigma_F ~ \mathnormal{Normal}^+(0,0.1)\) values were used to develop time series of fishing mortality values. Based on the random fishing mortality values, catch was calculated and removed from the population in the same way as described in Section 3.1.4. Based on the results of the prior pushforward, the \(\sigma_F\) distribution was further filtered (Figure 7) based on the same criteria used in Section 3.2.2 with the additional criteria that:
- Depletion in the terminal year \(>\) 2%, \(R_{max} < 1\), trend in depletion over the last decade between -5% and 10%, and maximum simulated catch was less than 350,000.
This last filtering step was done to ensure that observed catch values fell within the \(95^{th}\) percentile of the simulated catch, and with a median simulated catch that approximated the magnitude of observed catches (Figure 8). Fitting a \(\mathnormal{Normal}^+\) distribution to the remaining \(\sigma_F\) values results in the prior distribution specified in Table 2 and shown in (Figure 9). While all parameters (e.g., \(R_{Max}\) and \(logK\)) were subject to this new filtering step only the \(\sigma_F\) distribution showed meaningful change. Therefore, when setting up the 0003-2024cpueFPrior model, the prior distributions for the non-\(\sigma_F\) parameters were assumed to be the same as the other two models.
3.3 Model Implementation
Models were implemented using the No-U-Turn Sampler (NUTS) in Stan. Each model was run using 5 independent Markov chains with randomly generated starting values to ensure robust exploration of the posterior parameter space. Each chain was sampled for 3,000 iterations, with the first 1,000 iterations discarded as warmup to allow for algorithm adaptation. To reduce posterior sample autocorrelation and storage requirements, every 10 samples was retained from the post-warmup iterations, yielding a final sample size of 1,000 draws from the joint posterior distribution across all chains. The NUTS algorithm was configured with strict adaptation parameters to ensure reliable convergence. The adaptation delta was set to 0.99 to reduce the step size and minimize divergent transitions, while the maximum tree depth was increased to 12 to allow for longer trajectories during the dynamic sampling process.
Model convergence and performance were assessed using multiple diagnostic criteria recommended for Bayesian stock assessment workflows (Monnahan, 2024). The potential scale reduction factor (\(\hat{R}\)) was required to be less than 1.01 for all parameters, indicating convergence across chains. Effective sample sizes were required to exceed 500 to ensure adequate precision in posterior estimates. Additionally, posterior predictive checks were conducted to evaluate model adequacy through comparison of observed versus predicted index and catch values, while parameter identifiability was assessed through visual examination of posterior distributions and correlation structures among estimated parameters (Gabry et al., 2019).
3.4 Model Diagnostics
Model performance was comprehensively evaluated using multiple diagnostic criteria recommended for Bayesian stock assessment workflows (Monnahan, 2024). The diagnostic framework included Stan model convergence criteria, data fits, posterior predictive checks, retrospective analysis, and hindcast cross-validation.
3.4.1 Convergence Diagnostics
Conventional Stan model convergence diagnostics and thresholds were employed to identify whether posterior distributions were representative and unbiased (Monnahan, 2024). Models were considered to have ‘converged’ to a stable, unbiased posterior distribution if they satisfied the following criteria:
- Potential scale reduction factor (\(\hat{R}\)) less than 1.01 for all leading model parameters, indicating convergence across chains
- Bulk effective sample size (\(N_{eff}\)) greater than 500 for all leading model parameters, ensuring adequate precision in posterior estimates
- No divergent transitions during the NUTS sampling process, which can indicate problematic posterior geometry
- Maximum tree depth not exceeded during sampling, ensuring adequate exploration of parameter space
Additionally, trace plots and rank plots were examined to visually assess mixing and convergence behavior across chains (Gabry et al., 2019).
3.4.2 Data Fits
Model fit to the abundance index was quantified using normalized root-mean-squared error (NRMSE):
\[RMSE = \sqrt{\frac{\sum_{i=1}^{n} (y_i - \hat{y}_i)^2}{n}}\]
\[NRMSE = \frac{RMSE}{\sum_{i=1}^{n} y_i / n}\]
where \(y_i\) are the observed values and \(\hat{y}_i\) are the model predictions. The average NRMSE across all posterior samples was calculated for each data component. Additionally, Probability Integral Transform (PIT) style residuals were calculated for the fit to the index.
3.4.3 Posterior Predictive Checks
Posterior predictive checks were conducted to evaluate model adequacy by comparing observed data with data simulated from the fitted model. For each posterior sample, synthetic datasets were generated using the estimated parameters, and agreement between observed and predicted datasets was assessed visually.
3.4.4 Retrospective Analysis
Retrospective analysis was conducted for each model by sequentially peeling off a year from the terminal end of the fitted index and re-running the model. Data were removed for each year up to five years from 2022 to 2018. Estimates of \(x\) in the terminal year of each retrospective peel were compared to the corresponding estimate of \(x\) from the full model run to better understand any potential biases or uncertainty in terminal year estimates. The Mohn’s \(\rho\) statistic (Mohn, 1999) was calculated and presented. This statistic measures the average relative difference between an estimated quantity from an assessment (e.g., depletion in final year) with a reduced time-series of information and the same quantity estimated from an assessment using the full time-series.
Additionally, following Kokkalis et al. (2024) the proportion of retrospective peels (or coverage) where the relative exploitation rate (\(U/U_{MSY}\)) and relative depletion (\(D/D_{MSY}\)) were inside the credible intervals of the full model run was calculated. This metric provides an additional perspective on retrospective consistency by evaluating whether the uncertainty estimates from the full model appropriately capture the range of estimates obtained when using reduced datasets.
3.4.5 Hindcast Cross-Validation
Hindcast cross-validation (Kell et al., 2021) was conducted for each index to determine the performance of the model to predict the observed CPUE \(I\) one-step-ahead into the future relative to a naïve predictor. Briefly, the ‘model-free’ approach to hindcast cross-validation was used, and made use of the same set of five retrospective peels described in Section 3.4.4.
The ‘model-free’ hindcast calculation is described using the model from the last peel \(BSPM_{2017}\) as an example. This model fit to index data through 2017 but included catch through 2022. The model estimates of predicted CPUE in 2018 based on \(BSPM_{2017}\) is the ‘model-free’ hindcast for 2018, \(\hat{I}_{2018}\). The naïve prediction of CPUE in 2018 is simply the observed CPUE from 2017, \(I_{2017} = \ddot{I}_{2018}\). The absolute scaled error (ASE) of the prediction is:
\(ASE_{2018} = \frac{|I_{2018} - \hat{I}_{2018}|}{|I_{2018} - \ddot{I}_{2018}|}\)
Repeating this calculation across all retrospective peels for years 2019-2022 and taking the average across ASE values gives the mean ASE or MASE for the model. An MASE value less than one indicates that the model has greater predictive skill than the naïve predictor.
3.5 Forecasts
Simple, status quo catch based stochastic projections over a 10 year window are provided for each model. For each model, projected catch was taken as the average catch over the terminal 5 years (2018-2022). Process error in the projection period was randomly resampled from the estimated process error in the main model period.
4 Results
4.1 Model Convergence and Diagnostics
All three BSPM model variants achieved satisfactory convergence based on standard Hamiltonian Monte Carlo diagnostics (Table 3). All models met all convergence criteria with \(\hat{R}\) values well below the 1.01 threshold and effective sample sizes exceeding the recommended minimum of 500 (100 samples per chain). No divergent transitions were observed across chains, and maximum tree depth was not exceeded during sampling, which did not indicate issues in exploring the posterior space.
4.1.1 Model Fit to Data
4.1.1.1 Catch Data Fit
All models were fit to the observed catch time series (1952-2022) with a coefficient of variation of 0.2 applied to account for uncertainty in catch estimates. Given that the fishing mortality required to fit the catch was also directly estimated, model fit to the catch was good (Figure 10). Notably, the models predicted lower than observed catch for the large catch event in 1954.
4.1.1.2 CPUE Index Fit
Model fit to the standardized CPUE index (1988-2022) was evaluated using normalized root-mean-squared error (NRMSE). All three models provided reasonable fits to the observed index given the high variability and observation error (Table 3). The 0003-2024cpueFPrior model showed marginally improved fit to the CPUE index, with lower RMSE values. In general, all models successfully captured the general declining trend in the standardized CPUE from 1988-2022. However, the estimated combination of observation and process error meant that the model was not pulled as tightly to each individual observation resulting in a persistant overfit in the last decade (e.g., negative residuals Figure 11). Posterior predictive checks confirmed that the models were able to replicate the key features of the observed CPUE data (Figure 2).
4.1.2 Model Validation
Overall retrospective bias was very low, and close to 0 for all models (Table 3; Figure 12) indicating a lack of persistant bias in terminal estimates as successive years of the index were removed from the model. Additionally, estimates of the relative exploitation rate (\(U/U_{MSY}\)) and relative depletion (\(D/D_{MSY}\)) were inside the credible intervals of the full model run 100% of the time (Table 3). In terms of hindcast cross validation, all models (though 0003-2024cpueFPrior model showed the best performance) exceeded the performance of the naïve predictor (Table 3; Figure 13) which provides some support that the model has identified and estimated a production function for the stock.
4.2 Population Dynamics and Stock Status
In all models, most leading parameters (log(K), \(R_{max}\), \(\sigma_P\), \(n\), \(\sigma_F\), and \(\sigma_{O,add}\)) showed posterior update from their respective prior distributions (Figure 14), indicating that there was sufficient information in the data to impact parameter estimates. The shape parameter \(n\) showed little deviation from the Schaefer-model prior. All models estimated similar levels of \(\sigma_P\) and additional observation error \(\sigma_{O,add}\) (Table 4) which is unsurprising given that they fit to the same catch and standardized index. Though estimates of \(\sigma_P\) were elevated from the prior, the model mainly reconciled the variability in the index by estimating substantial additional observation error \(\sigma_{O,add}\) resulting in a roughly 3:1 of total observation error \(\sigma_O\) to process error \(\sigma_P\). Similar to what was seen in ISC (2024), there was a trade-off between log(K) and \(\sigma_F\) where higher population scale was associated with lower fishing mortality and vice-versa.
Substantial posterior update was shown for the \(R_{max}\) parameter where all models estimated productivity on the lower end of the biological prior. These median estimates of \(R_{max}\), between 0.2 - 0.3 (Table 4), are not completely inconsistent, with estimates of \(R_{max}\) for an analagous species in the Atlantic Ocean, white marlin (Kajikia albida). Based on the most recent ICCAT assessment (ICCAT, 2020) implemented using JABBA (Winker et al., 2018), \(R_{max}\) was estimated to be 0.163 (0.122 - 0.215 95% credible interval) which is contained within the posterior estimates of all three models (Table 4). In the case of the Atlantic white marlin assessment, the JABBA model showed remarkable consistency with estimates of stock status and trend relative to a fully integrated, length structured model implemented in Stock Synthesis (ICCAT, 2020). Returning to the current analysis, a lower level of productivity \(R_{max}\) was estimated, with less uncertainty (Table 4), for the 0003-2024cpueFPrior model assuming a more depleted stock. Filtering the original biological prior to match the \(R_{max}\) posterior distribution from the 0003-2024cpueFPrior model using an inverse transform sampling approach identifies that this lower productivity is characterized by a pronounced shift to lower steepness values \(h\) and by a lesser extent to smaller values of von Bertalanffy \(k\) (Figure 15).
The estimated population depletion time series shows a decline from the assumed unfished state in 1952 to a minimum around the early 1990s (Figure 16). All three models estimated similar depletion trajectories, with differences in depletion related to estimated population scale. The 0003-2024cpueFPrior model estimated slightly lower depletion levels throughout the time series compared to the other two models. From the minimum depletion period, all models estimated a gradual recovery in population size through the 2000s and 2010s, with the population stabilizing at moderate depletion levels in recent years. The credible intervals around the depletion estimates are relatively wide, particularly in the early period when no abundance index data are available to inform the model, reflecting the inherent uncertainty in reconstructing historical population dynamics. The estimated population time series mirrors the depletion patterns. The 0001-2024cpueExPrior model estimated the highest population scale throughout the time series, while the 0002-2024cpueDepPrior and 0003-2024cpueFPrior models estimated lower population levels. The uncertainty in absolute population size is substantial, as reflected in the wide credible intervals, though the relative trends are more precisely estimated.
Posterior estimates of depletion in 1988 differed notably among the three models (Table 4), and in the case of the 0002-2024cpueDepPrior and 0003-2024cpueFPrior models differed from the prior \(x_{1988} =\) 0.637 value (Table 2). The baseline model (0001-2024cpueExPrior) estimated median depletion of \(x_{1988} =\) 0.87 (0.59-1.06), while the depletion-constrained model (0002-2024cpueDepPrior) estimated median \(x_{1988} =\) 0.75 (0.56-0.94). The 0003-2024cpueFPrior model with a more representative prior on fishing mortality variability estimated the greatest depletion levels in 1988, \(x_{1988} =\) 0.67 (0.50-0.89).
Process error estimates (Figure 16) showed consistent temporal variability across models, with the largest deviations occurring during periods of rapid population change in the 1970s and 1980s. This indicates that removals alone were not able to explain the observed changes in the standardized index. The estimated process error standard deviation (Table 4) while similar across all models, was lowest for the 0003-2024cpueFPrior model indicating perhaps a greater internal consistency in the model dynamics.
4.3 Biological Reference Points
MSY-based reference points were derived from the Fletcher-Schaefer production function for all three BSPM variants. The relative stock status time series (Figure 16) shows consistent trends across all three models, albeit at different relative levels. However median estimates of the \(D/D_{MSY}\) ratio indicates the stock was above the MSY reference level for the entire model period. Only the 0003-2024cpueFPrior indicates a risk of being below the MSY-based reference point in recent years. Similarly, median estimates of \(F/F_{MSY}\) ratio were all consistently below the overfishing reference point with episodic periods of risk of overfishing indicated by the 0003-2024cpueFPrior model. However, the recent trend indicates a declining risk in overfishing consistent with the estimated recent recovery of the stock.
4.4 Future Projections
Ten-year stochastic projections (2023-2032) were conducted for all three BSPM variants under a status quo catch scenario, with projected removals set to the average catch over 2018-2022 (Figure 17). Under status quo catch levels, all models projected continued population recovery through 2032. The depletion trajectories show gradual increases from current levels, with the 0001-2024cpueExPrior model projecting the most optimistic recovery due to its higher estimated population scale. The 0002-2024cpueDepPrior and 0003-2024cpueFPrior models showed more modest but consistent increases in depletion over the projection period.
Relative to MSY-based reference points, the projections indicate the stock will likely remain above \(D_{MSY}\) throughout the projection period for the baseline and depletion-constrained models. The 0003-2024cpueFPrior model showed greater variability and some risk of declining below \(D_{MSY}\) in the near term, though the median trajectory suggested stabilization above the reference point by the end of the projection period. Projected fishing mortality rates remained well below \(F_{MSY}\) for all models under the status quo catch scenario, indicating that current harvest levels are likely sustainable and consistent with continued stock recovery. The uncertainty in projections increased over time, reflecting both parameter uncertainty and stochastic variability in future population dynamics.
5 Discussion
5.1 General remarks
All three BSPM model variants estimated consistent depletion trajectories showing a decline from the assumed unfished state in 1952 to a minimum around the early 1990s, followed by gradual recovery through the 2000s and 2010s. Median estimates of the \(D/D_{MSY}\) ratio indicate the stock was above the MSY reference level for the entire model period, though the 0003-2024cpueFPrior model indicates some risk of being below the MSY-based reference point in recent years. Similarly, median estimates of \(F/F_{MSY}\) ratio were consistently below the overfishing reference point, with the recent trend indicating declining risk of overfishing consistent with estimated stock recovery.
While the BSPM can represent the best available scientific information absent a satisfactorily performing integrated age-structured model, caution is needed in interpreting results given the simplifying assumptions made related to population and fishery dynamics. However, the BSPM approach offers advantages relative to challenges encountered in the integrated age-structured assessment. One of the main challenges with the integrated assessment was the estimation of population scale. Estimates of scale from the integrated assessment were largely influenced by size composition data that depend on the correct specification of growth, mortality, length-weight, selectivity, and reproduction parameters. If productivity assumptions are mis-specified or fishery selectivity patterns are poorly understood, the scale information from size composition data in the integrated assessment could be wrong. In this way a production model can be advantageous as it condenses uncertainty in those assumptions into a few parameters to more efficiently explore the productivity space.
5.2 Challenges, limitations & key uncertainties
The current analysis faces several important limitations. The strong assumption of no age-specific population dynamics can be of concern depending on the relationship between mortality (both natural and fishing mortality) at age and reproductive potential at age. The simplified population dynamics also provide no framework for incorporating biological information beyond what is captured in the production function parameters, or for explicitly including other sources of information (e.g., size composition data). Additionally, the aggregated nature of surplus production models can potentially bias MSY estimates, particularly when age-specific processes differ significantly from the assumptions implicit in the aggregate production function. For species with complex life histories like striped marlin, these limitations should be considered when using MSY-based reference points derived from surplus production models for management decisions.
On the data availability side, the model is fit to a single standardized index of abundance, which limits the ability to evaluate uncertainty in abundance trends. If alternative indices were available, a model ensemble approach could better characterize this uncertainty. Furthermore, the incorporation of a depletion signal from New Zealand recreational size composition by putting a prior on depletion in 1988, while novel, is crude and less elegant than other modeling approaches such as delay-difference or age-structured population models which can explicitly fit in a likelihood-based framework to time series of average size or actual size composition data.
In terms of modeling assumptions, current modeling assumes the same level of catch uncertainty \(\sigma_C\) across all time periods. Future versions of the model could be refined to allow time-varying catch uncertainty (e.g., allowing for greater uncertainty in historical catch observations relative to recent observations) or for allowing for effort-driven removals similar to the catch-errors with effort deviation approach available in MULTIFAN-CL (Fournier et al., 1998).
Lastly, population scale estimates were highly uncertain and sensitive to model configuration choice, while estimates of population depletion were more precisely estimated. The choice of prior on \(\sigma_F\) was particularly influential, and while this prior was developed sensibly using prior pushforward analysis (Kim & Neubauer, 2025), it represents a key uncertainty. Similarly, uncertainty in key biological parameters (growth, maturity, natural mortality) due to limitations in the number of sample sizes available for analysis contributed to the broad initial prior for \(R_{max}\). While the posterior distribution for \(R_{max}\) was substantially updated from the prior, reducing uncertainty in key biological productivity parameters could help reduce model uncertainty estimates and facilitate development of age-structured models.
5.3 Future stock assessment modeling considerations
Future assessment efforts should build back up to full age-structure through a progressive approach where complexity is increased in turn: delay-difference model, age-structured production model, and then fully integrated age-structured model. This development should occur in a Bayesian context to inform parameter estimates where prior information exists or can be derived using a prior-pushforward approach (Kim & Neubauer, 2025). Developing a Bayesian age-structured modelling approach can provide for more stable estimation of an increased number of parameters using priors, more explicitly account for uncertainty in parameters rather than making fixed assumption, potentially allow for informed estimation of difficult-to-observe biological relationships (e.g., stock-recruit), address issues with the BSPM by explicitly consider age-structured processes and fishery selectivity, and estimate leading parameters with likelihood components for indices and size composition.
While simplified models offer advantages when data conflicts exist, over-simplification of age-structured processes could result in biases. Future modeling must balance complexity with data quality and conflicts. The transition should be guided by careful evaluation of whether increased model complexity is supported by available data and whether it improves the reliability of management advice.
6 Declaration of Generative AI use
A generative artificial intelligence (AI) assistant, Anthropic Claude Sonnet 4.0, was used to parse model code in the preparation of an initial draft of this report.
7 References
8 Tables
| Parameter | Symbol | Distribution | Mean/Mode | CV/Range | Correlation | Source/Notes |
|---|---|---|---|---|---|---|
| Length at age 1 | \(L_1\) | Lognormal | 60 cm | 0.2 | Independent | (Ducharme-Barth et al., 2025) |
| Length at age 10 | \(L_2\) | Lognormal | 210 cm | 0.2 | Independent | (Ducharme-Barth et al., 2025) |
| Growth coefficient | \(k\) | Beta(6.5, 3.5) | ~0.65 | - | Independent | (Ducharme-Barth et al., 2025) |
| Length CV | \(cv_{len}\) | Uniform | - | [0.05, 0.25] | - | Growth variability |
| Maximum age | \(A_{max}\) | Lognormal | 15 years | 0.2 | \(\rho\) = -0.3 with \(M_{ref}\) | (Farley et al., 2021) |
| Reference mortality | \(M_{ref}\) | Lognormal | 0.36 \(yr^{-1}\) | 0.44 | \(\rho\) = -0.3 with \(A_{max}\) | (Hamel & Cope, 2022) (5.4/\(A_{max}\)) |
| Maturity slope | \(a_{mat}\) | Normal | -20 | 0.2 | - | Logistic steepness |
| Length at 50% maturity | \(L_{50}\) | Lognormal | 184 cm | 0.2 | - | (Farley et al., 2021) |
| Weight coefficient | \(a_w\) | Lognormal | \(5.4\times10^{-7}\) | 0.05 | \(\rho\) = -0.5 with \(b_w\) | (Castillo-Jordan et al., 2024) |
| Weight exponent | \(b_w\) | Normal | 3.58 | 0.05 | r = -0.5 with \(a_w\) | (Castillo-Jordan et al., 2024) Constrained: 150-350 kg at 300 cm |
| Steepness | \(h\) | Beta(3, 1.5) | ~0.73 | - | - | Censored to [0.2, 1.0] |
| Sex ratio (prop. female) | \(s\) | Normal | 0.5 | 0.05 | - | Constrained [0.01, 0.99] |
| Reproductive cycle | - | Fixed | 1 year | - | - | Annual spawning |
| Length at first capture | \(L_c\) | Lognormal | 140 cm | 0.125 | - | Fishery selectivity |
| Reference ages | \(age_1\), \(age_2\) | Fixed | 0, 10 years | - | - | Francis parameterization |
| Parameter | Distribution | Mean | SD | Description |
|---|---|---|---|---|
| Core Population Parameters | ||||
| \(K\) | Lognormal | \(\log(1,049,036)\) | 0.46 | Carrying capacity |
| \(R_{max}\) | Lognormal | \(\log(0.5099)\) | 0.46 | Maximum intrinsic rate of increase (from biological simulation) |
| \(n\) | Lognormal | \(\log(2)\) | 0.1 | Shape parameter (Pella-Tomlinson production function); \(n = 2\) is a Schaefer model |
| Process and Observation Error | ||||
| \(\sigma_P\) | Lognormal | \(\log(0.0533)\) | 0.27 | Process error standard deviation (Winker et al., 2018) |
| \(\sigma_{O,add}\) | Half-Normal | 0 | 0.2 | Additional estimated observation error |
| \(\sigma_F\) | Half-Normal | 0 | 0.025 | Fishing mortality variability (0001-2024cpueExPrior & 0002-2024cpueDepPrior) |
| \(\sigma_F\) | Half-Normal | 0 | 0.047 | Fishing mortality variability (0003-2024cpueFPrior) |
Depletion Constraint (0002-2024cpueDepPrior & 0003-2024cpueFPrior) |
||||
| \(x_{1988}\) | Lognormal | \(\log(0.6374)\) | 0.2 | Initial depletion at \(t=37\) (this analysis) |
| Model | Max \(\hat{R}\) | Min ESS | Divergent | Tree Depth | BFMI Issues | Index RMSE | Mohn’s \(\rho\) | MASE | Coverage \(D/D_{MSY}\) | Coverage \(U/U_{MSY}\) |
|---|---|---|---|---|---|---|---|---|---|---|
0001 |
1.009 | 724 | 0 | 0 | 0 | 0.28 | 0.006 | 0.966 | 100% | 100% |
0002 |
1.009 | 674 | 0 | 0 | 0 | 0.27 | -0.005 | 0.863 | 100% | 100% |
0003 |
1.008 | 728 | 0 | 0 | 0 | 0.265 | -0.026 | 0.779 | 100% | 100% |
Notes:
0001 corresponds to the 0001-2024cpueExPrior model
0002 corresponds to the 0002-2024cpueDepPrior model
0003 corresponds to the 0003-2024cpueFPrior model
- Max \(\hat{R}\): Maximum potential scale reduction factor across all parameters (should be < 1.01)
- Min ESS: Minimum effective sample size across all parameters (should be > 500)
- Divergent: Number of divergent transitions during sampling (should be 0)
- Tree Depth: Whether maximum tree depth was exceeded during sampling
- BFMI Issues: Whether low Bayesian Fraction of Missing Information was detected
- Index RMSE: Root mean squared error for fit to standardized CPUE index
- Mohn’s \(\rho\): Retrospective bias statistic (values close to 0 indicate low bias)
- MASE: Mean Absolute Scaled Error from hindcast cross-validation (< 1 indicates model outperforms naïve predictor)
- Coverage \(D/D_{MSY}\): Percentage of retrospective peels where relative depletion estimates fall within full model credible intervals
- Coverage \(U/U_{MSY}\): Percentage of retrospective peels where relative exploitation rate estimates fall within full model credible intervals
| Model | log(K) | \(R_{max}\) | \(\sigma_P\) | Shape (\(n\)) | \(\sigma_F\) | \(\sigma_{O,add}\) | \(\sigma_{O}\) | \(x_{1988}\) |
|---|---|---|---|---|---|---|---|---|
0001 |
13.8 (13.2-14.6) | 0.27 (0.13-0.76) | 0.07 (0.04-0.11) | 1.97 (1.62-2.37) | 0.03 (0.01-0.07) | 0.10 (0.04-0.19) | 0.27 (0.20-0.35) | 0.87 (0.59-1.06) |
0002 |
13.7 (13.2-14.5) | 0.23 (0.12-0.56) | 0.07 (0.04-0.12) | 1.96 (1.62-2.35) | 0.04 (0.02-0.08) | 0.09 (0.03-0.17) | 0.26 (0.20-0.34) | 0.75 (0.56-0.94) |
0003 |
13.3 (12.8-14.0) | 0.23 (0.13-0.45) | 0.06 (0.03-0.10) | 1.97 (1.65-2.38) | 0.07 (0.03-0.13) | 0.09 (0.03-0.16) | 0.26 (0.19-0.33) | 0.67 (0.50-0.89) |
Parameter Definitions:
0001 corresponds to the 0001-2024cpueExPrior model
0002 corresponds to the 0002-2024cpueDepPrior model
0003 corresponds to the 0003-2024cpueFPrior model
- log(K): Natural logarithm of carrying capacity (total numbers)
- \(R_{max}\): Maximum intrinsic rate of population increase (per year)
- \(\sigma_P\): Process error standard deviation
- Shape (\(n\)): Pella-Tomlinson production function shape parameter
- \(\sigma_F\): Fishing mortality variability standard deviation
- \(\sigma_{O,add}\): Extra estimated observation error component
- \(\sigma_{O}\): Total observation error (input + additional)
- \(x_{1988}\): Population depletion in 1988 (year 37 of model period)
9 Figures
0001-2024cpueExPrior and 0002-2024cpueDepPrior models is shown in green.
0001-2024cpueExPrior (baseline model with expert priors), the central panel shows model 0002-2024cpueDepPrior (depletion-constrained model incorporating 1988 depletion prior from New Zealand recreational weight data), the right panel shows model 0003-2024cpueFPrior which assumes a less restrictive prior on fishing mortality variability. Colored lines represent posterior median predictions with shaded uncertainty bands showing 95% credible intervals.
10 Technical Annex
Additional technical detail of population dynamics components used to solve the Euler-Lotka equation for \(R_{max}\) and implement the Gedamke-Hoenig approach (Gedamke & Hoenig, 2006) for using size composition data to estimate mortality in non-equilibrium situations.
10.1 Equilibrium Age-Structured Population Dynamics
10.1.1 Survival
Survival schedule was calculated assuming constant natural mortality:
\(l_a = \begin{cases} 1 & \text{if } a = 1 \\ l_{a-1} \times \exp(-M_{ref}) & \text{if } 1 < a < A_{max} \\ \frac{l_{A_{max}-1}}{1-\exp(-M_{ref})} & \text{if } a = A_{max} \text{ (plus group)} \end{cases}\)
10.1.2 Fecundity
Reproductive output incorporated biological constraints: \[f_a = \frac{\psi_a \times W_a \times s}{\rho}\]
where: - \(\psi_a\) = proportion mature at age \(a\) - \(W_a\) = weight at age \(a\) (proxy for reproductive capacity) - \(s\) = sex ratio (proportion female) - \(\rho\) = reproductive cycle (years between spawning events)
Maturity-at-age was derived from length-based logistic maturity: \[\psi_L = \frac{\exp(a_{mat} + b_{mat} \times L)}{1 + \exp(a_{mat} + b_{mat} \times L)}\] where \(b_{mat} = -a_{mat}/L_{50}\), then integrated over the length-at-age distribution to obtain \(\psi_a\).
Length-at-age followed the Francis parameterization: \[L_a = L_1 + (L_2 - L_1) \times \frac{1 - \exp(-k(a - age_1))}{1 - \exp(-k(age_2 - age_1))}\]
with length variability modeled using probability density functions linking length bins to ages.
Weight-at-age used the allometric relationship: \[W_a = a_w \times L_a^{b_w}\]
10.1.3 Stock-Recruitment Relationship
The slope at the origin of the stock-recruitment curve was: \[\alpha = \frac{4h}{(1-h)\phi_0}\]
The calculation incorporated steepness (\(h\)) through the Beverton-Holt stock-recruitment relationship. Spawners-per-recruit in the unfished state was: \[\phi_0 = \sum_{a=1}^{A_{max}} l_a f_a\]
This \(\alpha\) parameter links the intrinsic rate of increase to recruitment compensation, ensuring that \(R_{max}\) estimates reflect realistic population productivity under the assumed stock-recruitment dynamics.
Successful simulations (those yielding \(R_{max} > 0\) and \(R_{max} < 1.5\)) were filtered to create a plausible prior distribution (Figure 3). The resulting distribution was fitted to a lognormal prior for use in the BSPM (Figure 4).
10.2 Transitional Mean Weight Model
For each biological parameter combination, total mortality rates \(Z_1\) and \(Z_2\) were estimated by fitting a transitional mean weight model to observed temporal changes in recreational catch weights. The expected mean weight during the transition follows:
\[\bar{W}(d) = W_{\infty} - \frac{Z_1 Z_2 (W_{\infty} - W_c) [Z_1 + k + (Z_2 - Z_1)e^{-(Z_2+k)d}]}{(Z_1 + k)(Z_2 + k)[Z_1 + (Z_2 - Z_1)e^{-Z_2 d}]}\]
where: - \(d\) = time since mortality change (years after 1952) - \(W_{\infty}\) = asymptotic weight from growth parameters - \(W_c\) = weight at first capture (corresponding to \(L_c\)) - \(k\) = von Bertalanffy growth coefficient
The mortality parameters were estimated in a likelihood framework by maximizing the likelihood of the observed mean weights given a time series of predicted mean weights \(\bar{W}(d)\). Optimization was constrained such that \(Z_1, Z_2 \geq M_{ref}\) to ensure biologically realistic mortality estimates.